The term ‘big data’ has become a buzzword in the medical literature, yet medical data remains largely inaccessible due to insufficient digitization, proprietary restrictions, and privacy concerns. This inaccessibility is particularly detrimental for developing and validating deep learning models for cancer detection in rare diseases like acute myeloid leukemia (AML) and acute promyelocytic leukemia (APL) using bone marrow smears. While conventional methods for sample size augmentation, such as geometric or photometric transformations, can boost training set sizes, we hypothesized that using synthetically generated bone marrow smear images for model training can enhance performance while preserving patient privacy, thereby facilitating unrestricted image data sharing.
We digitized bone marrow smears of 1251 AML and 51 APL patients as well as 236 healthy bone marrow donors by capturing field-of-view images at a resolution of 2560 * 1920 pixels, covering an area of 171 * 128 µm. StyleGAN2-ADA, a generative adversarial network, was used with initialized features for shape and color generation to generate bone marrow smear image data for AML, APL, and healthy donors. Both real and synthetic image data were then fed at varying proportions into a convolutional neural net classification model tasked with disease detection to determine the ratio of real-to-synthetic images needed to train an accurate disease detection model. Imbalances in data set sizes were accommodated for using standard image augmentation techniques such as rotation, mirroring, or linear transformations. Hyperparameter search was performed using the Optuna framework.
To evaluate the quality of synthetic images, a visual Turing test was conducted with 14 hematologists. Using a web application that displayed one image at a time (either real or synthetic), participants were asked to distinguish between the two. The resulting area-under-the-curve (AUC) of 0.63 indicated that experts could not reliably differentiate synthetic images from real bone marrow smears. Next, classification performance for binary decisions (AML vs. donors, APL vs. donors, AML vs. APL) was assessed starting with the total amount (100%) of available real samples (1251 AML, 51 APL, 236 donors) and zero (0%) synthetic samples. The proportion of synthetic images was incrementally increased by 10% while decreasing the real images until training was performed solely on synthetic samples. Starting with real samples only, we obtained a baseline classification AUC of 0.99 for AML vs. donors, 0.99 for APL vs. donors, and 0.99 for AML vs. APL. As the proportion of synthetic images increased, the classification performance remained stable, with AUCs above 0.95 for most real-to-synthetic combinations across all comparisons. Finally, for 0% real and 100% synthetic images, we obtained AUCs of 0.97, 0.99, and 0.96 for AML vs. donors, APL vs. donors, and AML vs. APL, respectively.
Our study highlights the feasibility of synthetic bone marrow image generation and its applicability for training image classification models in hematological microscopy with high accuracy at varying proportions of real and synthetic samples. Interestingly, model performance in our use cases remained high even when only synthetic images and no real images were used. This opens up the possibility to generate and freely share bone marrow image data that can be used to train and validate deep learning models at adequate performance levels, while maintaining patient privacy and overcoming data sharing burdens.
Eckardt:Novartis Oncology: Honoraria, Research Funding; Cancilico GmbH: Current Employment, Current equity holder in private company; Janssen: Consultancy, Honoraria; AstraZeneca: Honoraria; Amgen: Honoraria. Schmittmann:Cancilico: Current equity holder in private company. Riechert:Cancilico: Current equity holder in private company. Schulze:Janssen: Honoraria. Wendt:Cancilico GmbH: Consultancy, Current equity holder in private company. Middeke:Cancilico: Current equity holder in private company; Novartis Oncology: Research Funding.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal